Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement @fastmath #9406

Merged
merged 7 commits into from
Jan 13, 2015
Merged

Implement @fastmath #9406

merged 7 commits into from
Jan 13, 2015

Conversation

eschnett
Copy link
Contributor

This implements a macro @fastmath that works similar to @inbounds: expressions evaluated this way have LLVM's "fast-math" flag set, which is equivalent to calling clang with the -ffast-math option.

I experimented with two different ways to implement this. Both work, but obviously only one of them should be used, the other should be removed again.

Implementation 1: Math functions such as +, - have fast-math equivalent called add_fast, sub_fast, etc. These have their own intrinsics. A macro @fastmath (easy to implement, but not yet done) would convert math operations to their fast form. Advantage: This gives programmers full control over when to use fast math operations, and when not. Disadvantage: The macro needs to convert a parse tree, which may be slow. I believe @simd goes this way.

Implementation 2: The Julia compiler maintains a global state variable that describes whether fast-math is in effect or not. The @fastmath macro (this is how it is currently implemented) toggles this flag. Advantage: Very fast. Disadvantage: Uses global state, I can't tell whether the toggling works at the right times. This is equivalent to how @inbounds is working.

Please comment, both on the general idea of having @fastmath (I believe there is consensus that this is a good and necessary thing), as well as on the implementations. I will continue to update the branch with suggestions.

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2014

This should probably not be 36 commits. There's a lot of extra stuff here, a double-conversion tarball and help changes from your fma branch. Needs rebasing to clean it up next time you update the PR.

@eschnett
Copy link
Contributor Author

Here is an example of @fastmath in action:

julia> slow(x) = x+1.0+2.0
slow (generic function with 1 method)

julia> code_native(slow, (Float64,))
      .section        __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
        push    RBP
        mov     RBP, RSP
        movabs  RAX, 4484128304
Source line: 1
        vaddsd  XMM0, XMM0, QWORD PTR [RAX]
        movabs  RAX, 4484128312
        vaddsd  XMM0, XMM0, QWORD PTR [RAX]
        pop     RBP
        ret

julia> fast(x) = @fastmath x+1.0+2.0
fast (generic function with 1 method)

julia> code_native(fast, (Float64,))
      .section        __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
        push    RBP
        mov     RBP, RSP
        movabs  RAX, 4484151696
Source line: 1
        vaddsd  XMM0, XMM0, QWORD PTR [RAX]
        pop     RBP
        ret

As you can see, the @fastmath allowed the compiler to re-associate the floating-point addition, so that x+1+2, which is really (x+1)+2, is changed to x+(1+2), which is then optimized to x+3. (The floating-point add operations are the vaddsd instructions.)

@eschnett
Copy link
Contributor Author

@tkelman I'll be happy to rebase, but I'd really like feedback first. If you look at the diff then you see that there's nothing unrelated there, except the unmerged muladd and fma changes, which are very similar in spirit and could be subsumed in this change. The commit history is mostly from merging master into the branch (and resolving conflicts).

@jakebolewski
Copy link
Member

This is great! With @inbounds the decision seems to be that that the current behavior is a mistake. A global flag causes all called methods of within the block to be compiled with @fastmath as well.

Ex:

baz(x::Int) = ....
bar(x::Int) = ....
foo(x::Int) = begin
    @fastmath begin
        bar(baz(...))
    end
end

bar and baz could be compiled with @fastmath semantics which is not necessarily what you may want. Implementation1 does not have this problem, although it has the downside that you would need to special case all the operations you would want to support in the macro. With @inbounds the choice is to substitute unsafe_getindex with getindex so this is not much of a problem.

@ViralBShah
Copy link
Member

This is great. I would love to have the -ffast-math command line switch as well.

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2014

I would love to have the -ffast-math command line switch as well.

Please no. We have too many command-line switches. If it can be done as a macro, why does it need to be done as a command-line switch as well?

@vtjnash
Copy link
Member

vtjnash commented Dec 19, 2014

i agree that it would be bad to add this as a command line switch, and preferable to require a function to explicitly state that it is amenable to doing fast-math. for consideration, i think the implementation of @inbounds, @unchecked and @fastmath should all be the same, per the proposal in #8227 (comment)

@eschnett
Copy link
Contributor Author

I rewrote the @fastmath macro so that it does not rely on internal state of the compiler; instead, it rewrites the expression tree to replace all relevant operators by equivalent ones with a _fast suffix, e.g. + -> add_fast.

If this design looks good, then I'll clean up the patch to remove the other implementation and make it ready to merge.

We can switch to the implementation in #8227 once it is available. The core of the patch here is how to talk to LLVM to generate fast math operators, and specifying which math functions can/should be made fast. How the actual @fastmath macro works is tangential to this.

@eschnett
Copy link
Contributor Author

Note that the build failures above are caused by an unrelated problem.

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

The Travis failure was due to trailing whitespace on comment lines. We should probably be doing that check with a separate service so it shows up as a different status item, and doesn't preclude running the build and tests. The AppVeyor failure was due to an upgrade of the Git version on the build workers (considering the recent vulnerability), but was not matching the previous configuration so the build script could no longer find MSYS sh.exe on the path. Looks fixed now.

Please fix the whitespace so the Travis tests can actually run on this, and preferably doing a rebase next time you update the PR so it's fewer commits and easier to review without looking at all 19 changed files at once.

@ViralBShah
Copy link
Member

A global switch to to turn on fastmath globally, as is provided in many C compilers is not terribly useful. In this particular case, a command line switch to turn off fastmath globally would be useful for two purposes. Without changing the code, the effects of fastmath and not having fastmath can be tested by just starting julia with different command line options, and it is also useful for testing.

@eschnett
Copy link
Contributor Author

@tkelman I fixed the white space and rebased the branch.

I don't think rebasing the branch helped. There are now 22 instead of 19 changed files. A batch of changes to the autogenerated helpdb.jl crept in which I had to re-create due to conflicts during rebasing. Also, some unrelated changes appeared -- I don't know why, maybe I resolved conflicts wrong while rebasing. You still have 23 commits to look at, and since I used git push -f, the original branch is now lost.

You seem quite keen on rebasing, and on having a small number of commits -- you asked me twice. I understand that that's good to have when pulling a change back into the master. But during review -- does this actually help? I now have to go through the changes above manually and carefully review them, removing those changes that are unrelated and shouldn't be there. I don't think I produced anything that you would find useful. I consider this exercise a waste of time.

It's obvious that I didn't go through the commits above while rebasing, ensuring that all of them built and tested fine. That would have amounted to re-developing everything from scratch. (My original commits did, of course.)

It's a fact of life that I make my changes to Julia over the course of time, and not instantaneously. My history will thus always include merges from master, and will look messy. There isn't much worth in rebasing things to make it look as if I made my changes instantaneously. I'm happy to squash my changes before the final pull, but for everything else you have two choices:

  • Ignore my history, and look at the diff
  • Look at my history, and accept that it's not linear

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

But during review -- does this actually help? I now have to go through the changes above manually and carefully review them, removing those changes that are unrelated and shouldn't be there.

You had to resolve the same conflicts that you were earlier resolving via merge commits. Rebasing or merging are two different ways of updating a pull request against master, I'll stop asking if it's more time-consuming for you but I greatly prefer looking at a pull request that has been updated via rebasing than via merging. When history has a large number of tiny "correct typo" or "undo change" commits, then it's more tedious to look at the individual commits and I'll tend not to look at any of them individually - rebasing also gives you an opportunity to squash those so reviewers don't have to look at the insignificant ones. Looking over the entire diff at once can be a bit much to review when the change is nontrivial. If the pull request is a smaller number of mostly-independent commits, then looking at the individual commits is an effective way to break down the change into smaller, individually reviewable pieces. The development history of how you work on the change is not necessarily the most effective way to present the change for others to look at.

But this is a workflow concern that not everyone agrees on, and is less important than the actual substance here. Which seems mostly fine, though in need of more tests to verify the functionality is actually working.

@eschnett
Copy link
Contributor Author

Writing tests for @fastmath is interesting. Correctness tests are "easy"; one re-uses existing tests with reduced accuracy constraints. However, actually checking that things are faster (and not slower) is difficult. Looking at disassembled code is one way...

@ViralBShah
Copy link
Member

I think a performance test would be tough. How about tests that verify known expressions, where fast math gives an expected different result?

BTW, do you have a specific use case for fastmath? Just curious about it, if there is one driving this.

@eschnett
Copy link
Contributor Author

@tkelman I realize where our workflows differ. If I had started with a branch that contained only fast-math related work, then I could have rebased instead of merged, as you suggested. However, I usually use a master branch that contains several sets of changes, some of them from pull requests. When I implement a new feature, there is thus no "clean start" that would allow this rebasing. I then create a pull-request branch by cherry-picking and then fixing things up. This leads to the messy history.

@eschnett
Copy link
Contributor Author

@ViralBShah My use case is creating efficient code for numerical calculations, such as solving PDEs on a grid. When doing so, re-arranging floating-point operations can greatly improve performance. When I write down the original code, I do not choose any particular order of operations -- I just write code that "looks simple" or that "mimics the equations". Since there is no order that I prefer, I want to give the compiler all the freedom it wants in re-arranging the expressions. This is similar to @simd, but more general in that it does not only affect SIMD reduction operations, but code generation and register allocation in general.

@ViralBShah
Copy link
Member

Thanks for the explanation. How much faster fastmath is in these cases?

@eschnett
Copy link
Contributor Author

@ViralBShah Here is an example. Somewhat contrived, of course, to make @fastmath shine, but the numbers are real https://gist.github.com/eschnett/e7a7cb4555143cdd220b.

Here is the output:

$ ~/julia/bin/julia wave.jl
elapsed time: 0.983733443 seconds (0 bytes allocated)
4.443986180758243
eschnett@Redshift:~/src/jl (16:10:50)
$ ~/julia/bin/julia --ieee-math=yes wave.jl
elapsed time: 4.298326343 seconds (0 bytes allocated)
4.443986180758243

You see (1) that the output does not change much in this case (not at all, but that's just a coincidence), and (2) that @fastmath speeds up the code by about a factor of four.

I've used @inbounds and @simd, I pre-allocate all arrays, etc., so that only the floating point calculations are measured. I've also intentionally chosen a problem size that fits into the D1 cache, so that performance is not dominated by memory accesses. Such a setup is not unrealistic -- in my real-world application, the problem would be split into small chunks that fit into the D1 cache, and in each such chunk there are stencil operations that access each array element multiple times.

Truth be told: I don't understand this factor of four. This is on my laptop, with an Intel CPU with AVX instructions. The only difference I see in the disassembled output is that @fastmath turns a division into a multiplication. I would not expect such a speed difference from this.

@@ -79,22 +80,25 @@ static const char *opts =

void parse_opts(int *argcp, char ***argvp)
{
enum optids { check_bounds = 300,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 space indent https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#general-formatting-guidelines-for-c-code-contributions

The enum is a good idea though, despite my general dislike of proliferating command-line flags and the number of different places they need to be handled and documented.

@tkelman
Copy link
Contributor

tkelman commented Dec 23, 2014

Without changing the code, the effects of fastmath and not having fastmath can be tested by just starting julia with different command line options, and it is also useful for testing.

I'd much prefer aiming towards being able to either globally enable/disable compiler behaviors easily and concisely via Julia code, in addition to locally scoped operation like the current macros have been doing. We're proliferating too many command-line flags and it's an unfriendly way to talk to the compiler, especially for people who are using Julia through IDE's (this will likely grow to be a significant portion of Julia's users as those IDE's get better) who have to go to messy lengths to use command-line flags - http://www.reddit.com/r/Julia/comments/2pz8nj/optimizing_and_using_the_inbounds_macro/cn1l8o8 And it's messy code-wise, doing global state in C but very closely related functionality with Julia code.

@eschnett
Copy link
Contributor Author

I don't care much about what the interface for switching Julia's global state regarding fast-math should be. Switching it at run time is easy to implement -- the compiler essentially has global variables for such state (jl_compileropts_t in src/julia.h). This is not just fast-math, but also for bounds checking, malloc logging, optimization, and several others. These could easily be toggled from Julia.

However, getting a good semantics is not trivial as it may require recompiling all functions, or tagging functions with these flags. Command-line options at least make it obvious that this state is not mutable at run time. In any way, this can easily be added later.

@eschnett
Copy link
Contributor Author

On Tue, Dec 23, 2014 at 2:04 AM, Tony Kelman [email protected]
wrote:

Without changing the code, the effects of fastmath and not having fastmath
can be tested by just starting julia with different command line options,
and it is also useful for testing.

I'd much prefer aiming towards being able to either globally
enable/disable compiler behaviors easily and concisely via Julia code, in
addition to locally scoped operation like the current macros have been
doing. We're proliferating too many command-line flags and it's an
unfriendly way to talk to the compiler, especially for people who are using
Julia through IDE's (this will likely grow to be a significant portion of
Julia's users as those IDE's get better) who have to go to messy lengths to
use command-line flags -
http://www.reddit.com/r/Julia/comments/2pz8nj/optimizing_and_using_the_inbounds_macro/cn1l8o8
And it's messy code-wise, doing global state in C but very closely related
functionality with Julia code.

However, I usually use a master branch that contains several sets of
changes, some of them from pull requests.

I generally recommend against doing that, and others have as well. I think
it's a good idea to try to keep your fork's master a clean copy of upstream
so you only ever do fast-forward pulls or pushes to your fork's master.
Then you can branch at any time from upstream master or your fork's master
and it won't make much difference. It makes more sense to always work in
topic branches, especially for messy development stuff. Branches are cheap,
effective, and a good idea to use for almost everything. There's nothing
wrong with having your own personal dev branch with a bunch of different
things on it, but calling that master is problematic if that's where you're
basing pull requests off of.

I have both a "clean master branch" and a "dev branch". I named them
differently ("upstream-master" and "master"), but that's all. However, I am
using my master (your dev) for most of my daily work, since I don't tend to
implement new features in isolation: I implement them because I need them,
and then I also want to use them. All of them, even if they are spread
across multiple pull requests, or if they are not yet finished enough to be
pull requests.

When I implement a new feature, there is thus no "clean start" that
would allow this rebasing. I then create a pull-request branch by
cherry-picking and then fixing things up. This leads to the messy history.

If you're cherry-picking and fixing things up for making a pull request,
it would be much nicer to review if you could squash away irrelevant /
unrelated commits. That's primarily what I was asking you to do, if you
prefer merging over rebasing as a way of resolving conflicts then you can
still squash (at least in-between merge commits) with the merge commits
left separate and not have to resolve the same conflicts over again. Pull
requests are for other people to look at, not to record your entire
development history in them. What ends up in upstream master is often
unavoidably messy no matter what happens in PR's, but while it's in a PR
you at least have control over what it looks like for review.

Fair enough. Is there a particular way you recommend me squashing the
commits? I know I can squash them all into a single commit, and I plan to
do that before the final pull, but that's not useful. (I argue that you can
just ignore all my commits instead and look at the overall diff to get the
same effect.) I could create new commits from scratch by splitting the diff
into pieces -- say, all the changes to src that contains the run-time
library changes, then all the changes in base that make up the Julia
interface, the changes in ui for the command line options, and finally
all the doc changes. That would be rather easy to do, but I again ask
myself whether that's worthwhile -- that division is already there in the
overall diffs. I don't think finer grained commits make sense for this work
here, the change just isn't large enough for this. Of course I could do,
say, "introduce (unused) math_builder class", "use math_builder class for
new intrinsics", "export new intrinsics", but that would be
counter-productive since the commits are too small (and probably isn't what
you are suggesting).

Can you give me a specific example for how to squash/merge commits, say for
my changes above, that wouldn't require me to rebase all "unimportant"
changes and resolve a ton of conflicts?

Homebrew, Node, and several other projects have no-merge policies that
have advantages in terms of navigating the history. I don't expect Julia to
adopt such a policy mostly because it rules out using the convenient github
UI for merging PR's. But merge commits, which tend to get ignored because
they're almost always messy, can mask real problems - #9063 (comment)
#9063 (comment)

That's easily done with a final squash before pulling. If I squash every
time I update the branch, then people don't see what I changed in response
to their comments.

It's also a good idea to not commit auto-generated files like helpdb.jl,
to avoid unnecessary conflicts in files that anyone can regenerate at any
time. It's not critical that master always have the absolute most
up-to-date version of that file, there's an issue for generating it during
the build and taking it out of the repository.

The file is there, and it has conflicts. These conflicts arose from
rebasing. I never committed any changes, except to resolve conflicts during
rebasing. I now either commit with conflict markers, or update it manually
(which is unsafe, and I may get it wrong, and the changes still show up),
or I just update it.

In this discussion, I think we both agree on the advantages of having a
clean, global history that doesn't have many rather useless "update from
upstream" commits. We also agree that reviewing code is tedious, and one
has to pre-chew the changes. We may differ in the amount of code that can
be presented to reviewers in a single bite -- I consider my changes to be
straightforward enough, especially if you look at the four different
subdirectories separately. You may want four separate commits for this. I
don't think I'll update my development workflow to use clean feature
branches for every Julia-related development.

However, if there is a reasonably easy way to reduce the number of commits
I serve to reviewers I'll be happy to do so (see above). I now learned that
rebasing isn't as straightforward as I expected, and doesn't lead to a
clean history either. I'd be happy to squash all commits, but before
starting to do so I'd like to see an overall policy for this, lest I run
afoul of some other potential reviewer that then demands the exact opposite
("don't squash your commits -- I want to see what you changed in response
to my comment!").

-erik

Erik Schnetter [email protected]
http://www.perimeterinstitute.ca/personal/eschnetter/

@ViralBShah
Copy link
Member

I am reasonably sure that we'll want to squash these commits before merging - but for now, let's have others look at it. Any cleanup can be done close to merge time.

@vtjnash vtjnash changed the title Implement @fastmath RFC: Implement @fastmath Dec 23, 2014
@vtjnash
Copy link
Member

vtjnash commented Dec 23, 2014

regardless of how you develop, rebase is an incredibly powerful tool for allowing you to present your change request story exactly how you want. rebase -i origin/master is my typical incantation of this particular spell (with various forays into git reflog if I make a mistake). although, in some cases, git cherry-pick <sha1> is easier. if you structure your commits such that each represents a discrete change, it is much easier to move them around (git rebase, git cherry-pick & friends), and to dissect issues (git blame & friends). Even if you do your changes on top of a working branch (like me), you can later isolate the changes that you wanted for a separate reason and group them into their own pull-request. I just learned git add -p, which makes that process even more efficient.

when dealing with git's learning curve, it's also important to realize that every commit is a branch, and similarly, every branch is just a sequence of changes. some of the branches have names, and some of the names have additional metadata, such as which upstream branch the user wants them associated with. but that is all very mutable. you can quickly name a commit (git checkout -b newbranch) or move that pointer (git reset --hard sha1) and you can rearrange the commits however you want at any point (until they get accepted into origin/master and become part of everyone's common history).

That's easily done with a final squash before pulling. If I squash every
time I update the branch, then people don't see what I changed in response
to their comments.

github is remarkably forgiving about maintaining review comments across rebase and force pushes. and even if you delete or move the code entirely, it maintains it as a "comment on deleted code". i wouldn't worry about it.

Can you give me a specific example for how to squash/merge commits, say for
my changes above, that wouldn't require me to rebase all "unimportant"
changes and resolve a ton of conflicts?

The file is there, and it has conflicts. These conflicts arose from
rebasing. I never committed any changes, except to resolve conflicts during
rebasing. I now either commit with conflict markers, or update it manually
(which is unsafe, and I may get it wrong, and the changes still show up),
or I just update it.

if the file wasn't supposed to be part of the commit, you can do git reset <filename> to remove it. don't commit the conflict markers, since the purpose of rebasing is to simplify the history, to help the reviewer and assist with future git blame needs. sometimes, i'll do a git rebase -i <sha1 of branch origin> first and get rid of all the non-functional commits, so i'm left with a smaller number of commits that represent the actual separable changes. then i'll start a new branch(es) off of origin/master, and cherry-pick or rebase those particular commits.

@eschnett
Copy link
Contributor Author

As much as I'd like to keep the discussion on @fastmath instead of git rebase, I can't help myself: git rebase destroys the ability to git blame. In my original history, I tested each commit. When I rebase, I do not test each rebased commit. (When git rebase succeeds without conflicts, there's not even a prompt that lets me run the tests, and neither does Travis test these intermediate commits.) A rebased history is nice to look at, and can be used to tell a nice "change request story", but that about covers its usefulness. If there's a large change, then it's nice to break it up into pieces for the reviewers.

But then, when you have such a large change, maybe you want to create separate pull requests in the first place?

In this case, there are about 200 lines of new code (and very few changes to existing code), plus test cases, plus changes to an auto-generated file. I would be very surprised if that level exceeds the review capability of the average Julia contributor. After all, if you know intrinsics.cpp well enough to judge whether the change is well-thought-out and well-implemented, then you belong to a very small group of experts.

... and are there further comments on @fastmath?

@mcprentiss
Copy link
Contributor

I think this is wonderful. It lowers the barrier of entry into Julia, which is great. Loads of users are not going to want to spend a ton of time optimizing code. The learning curve of mastering multiple macros can slow the adoption of the language. Having something like -O3 in gfortran
which can capture most of the numerical optimizations is a beautiful thing.

Having a --fast-math command line interface I think is an important option. I would like to see the command line be accepted as it traditionally is for compilers. IDEs can adapt to Julia like IPython did with success. As Julia gains in popularity, this should be inevitable.

@tkelman
Copy link
Contributor

tkelman commented Jan 6, 2015

Looks great, sorry about all the back and forth earlier. I guess merge this first, then #9482 can be rebased to take it into account?

AppVeyor is going take a while to get to this, but this hadn't been showing many platform differences before IIRC. win64 failure looks unrelated.

@eschnett
Copy link
Contributor Author

eschnett commented Jan 7, 2015

I rebased again to trigger AppVeyor again.

@ViralBShah
Copy link
Member

Would be great if @ArchRobison can take a look.

@eschnett eschnett force-pushed the fast-math branch 3 times, most recently from 1f97e3f to 4e38b84 Compare January 9, 2015 19:46
@eschnett eschnett changed the title RFC: Implement @fastmath Implement @fastmath Jan 9, 2015
@eschnett eschnett force-pushed the fast-math branch 3 times, most recently from 7289f72 to 605f065 Compare January 12, 2015 20:00
@eschnett
Copy link
Contributor Author

ping?

@jakebolewski
Copy link
Member

Does this need a rebase after #9732? Otherwise I'll merge.

@ArchRobison
Copy link
Contributor

Looks good to me.

Was Julia's domain-checking for sqrt going to be turned off too for fast math?

function deriv!(u, du)
n = length(u)
dx = 1.0 / (n-1)
@fastmath @inbounds du[1] = (u[2] - u[1]) / dx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's worth showing the trick of wrapping several statements in a begin...end pair so that the @fastmath @inbounds only has to be written once.

@jakebolewski
Copy link
Member

Thanks @ArchRobison. I think you are right that it would be good to update the docs, but we can do that later.

jakebolewski added a commit that referenced this pull request Jan 13, 2015
@jakebolewski jakebolewski merged commit 65c0018 into JuliaLang:master Jan 13, 2015
@jakebolewski
Copy link
Member

Thanks @eschnett!

@eschnett eschnett deleted the fast-math branch January 13, 2015 19:53
@jiahao
Copy link
Member

jiahao commented Jan 18, 2015

Documentation note: for future reference, you want

:obj:`@fastmath`

not

:opt:`@fastmath`

The latter is an error in Sphinx.

Fixed in 3c0d8ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants